Scheduling Multithreaded Programming Instructions Based on Dependency Graph

ABSTRACT

A computer implemented method for scheduling multithreaded programming instructions based on the dependency graph wherein the dependency graph organizes the programming instruction logically based on blocks, nodes, and super blocks and wherein the programming instructions could be executed outside of a critical section may be executed outside of the critical section by inserting dependency relationship in the dependency graph.

TECHNICAL FIELD

Embodiments of the present invention relate to scheduling the executionof a sequence of programming instructions based on a dependency graph.

BACKGROUND

Multithreading and multiprocessing are common programming techniquesoften used to maximize the efficiency of computer programs by providinga tool to permit concurrency or multitasking. Threads are ways for acomputer program to be divided into multiple and distinct sequences ofprogramming instructions where each sequence is treated as a single taskand to be processed simultaneously. An application that may use themultithreaded programming technique is a packet-switched networkapplication that processes network packets in a high speedpacket-switched system concurrently.

To maintain and organize the different packets, a new thread may becreated for each incoming packet. In a single processor environment, theprocessor may divide its time between different threads. In amultiprocessor environment, different threads may be processed ondifferent processors. For example, the Intel™ IXA network processors(IXPs) have multiple microengines (MEs) processing network packets inparallel where each ME supports multiple threads.

The network performance in processing these packets depends on the timeit requires to process a packet; the faster a packet can be processedthe more efficient a switch is. The service time of a switch usuallyrefers to the time between the arrival and the departure of a packet.When a packet arrives, a series of tasks such as the receipt of thepacket, routing table look-up, and queuing can be performed by the newthread to service the packet. Resource access latency usually refers tothe time delay between the instant when resource access such as memoryaccess is initiated, and the instant when the accessed data in theresource is effective. For example, the time it takes to perform arouting table look-up is resource access latency. In many instances, theresource access latency in processing a packet takes up the majority ofthe service time.

In a multithread environment, a processor that is usually idle duringresource access latency may be used to execute a different thread. Thetime the processor executes the different thread overlaps the time theprocessor executes the previous thread usually refers to as resourceaccess latency overlapping or resource access latency hiding. Multiplethreads may access the same resource concurrently if one thread does notdepend on another thread. The following example demonstrates adependency relationship between two instructions and resource accesslatency overlapping and hiding.

FIG. 1 a depicts a sequence of programming instructions N₁ to N_(k+2).Instruction N₁ loads the data, stores in memory location R2, into memoryor register R1. After R1 is loaded with the data from memory locationR2, instruction N₁ asserts a signal s. Instructions N₂ through N_(k) areindependent from N₁ because these instructions do not need the data fromR1. Thus, they may be processed concurrently while N₁ accesses the datafrom memory location R2.

The duration in which N₁ loads the data may be referred to as theresource access latency 101. FIG. 1 b is a diagram illustrating theexecution of overlapping instructions. In this diagram, instruction N₁loads a data from memory location R2 into a register or memory R1 andsends signal s after the data is loaded. Concurrently, N₂ through N_(k)are executed while N₁ is executed. Instruction N_(k+2) depends from N₁because N_(k+2) needs the data from memory or register R1. Consequently,instruction N_(k+1) waits 104 for the signal s from instruction N₁ andblocks all the subsequent executions until the wait instruction issatisfied when the signal s is detected. Because instruction N₁ onlyasserts a signal s when the instruction finishes loading the data frommemory location R2 at 103, N_(k+2) is not executed until the signal s iscleared at 102. Subsequently, instruction N_(k+21) uses R1 in itsexecution at 105.

The instructions listed in FIG. 1 a may be run in a multithreadedenvironment where each thread handles one instruction. In such scenario,threads communicate to other threads through shared resources such asglobal memory, registers, or signals. For example, signal s, andregisters R1 . . . R5 are the shared resources and are accessible byinstructions N₁ . . . N_(k+2). In many instances, the shared recoursemay only be accessed by one thread, for example, until instruction N₁asserts a signal s, no instructions may be executed before instructionN_(k+2). This duration usually refers to a critical section becauseinstructions are executed in a mutually exclusive manner. A criticalsection may also be defined in terms of a program where a computerprogrammer marks a part of the program as the critical section. Forexample, a critical section may begin before instruction N_(k+1), whenit waits for signal s, and ends after the assertion of signal s.

A conventional method to implement a critical section is to use an entryand an exit protocol. For example, a token or a signal may be used topermit the entering or to indicate the exiting of a critical section. Anexample of the token or signal based critical section is illustrated inFIG. 2 where a thread 202 waits for a token or signal 204 from aprevious thread 201. After accessing its critical section, the thread202 then passes another token or signal 205 to a thread 203. Before thethread 203 receives the token or signal 205, the thread 202 hasexclusive access to a shared resource 210.

In a situation where an instruction blocks all subsequent executions,such as the wait instruction N_(k+1) in FIG. 1 a, is included in acritical section, the critical section becomes longer than it isnecessary. The critical section is longer because the wait instructionalready blocks all the subsequent executions, a critical section may notbe needed to ensure the exclusivity in accessing a shared resource.

Contents of the Invention

A network processor may be idle during the time a network packetaccesses a shared resource such as a memory. The performance of anetwork processor can be improved if it can process a second networkpacket while the first packet accesses the shared resource. When thenetwork processor processes multiple network packets, the access latencyoverlaps or hidden. Therefore, the problem is how to overlap or hide thelatency to optimize the network performance.

One embodiment of the invention includes a method to optimize a computerprogram that processes the network packets by designing the computerprogram in a multithreaded environment and overlapping the resourceaccess latency between different threads.

One embodiment of the invention organizes the computer program into aplurality of blocks, determines a critical section of the computerprogram, constructs a dependency graph, recognizes a portion of thecomputer program that could be executed outside of the critical section,and inserts a plurality of dependency relationships between theplurality of blocks to cause execution of the recognized portion of thecomputer program outside of the critical section.

The advantage of the embodied solutions is that threads may enter thecritical section sooner than the computer program has originallydesigned and therefore improves the performance of the network.

BRIEF DESCRIPTION OF THE FIGURES

Various embodiments are illustrated by way of example and not by way oflimitation in the figures of the accompanying drawings in which likereferences indicate similar elements. It should be noted that referencesto “an,” “one,” or “various” embodiments in this disclosure are notnecessarily to the same embodiment, and such references mean at leastone.

FIG. 1 a depicts an example of a sequence of programming instructions.

FIG. 1 b illustrates an example of resource access latency overlappingbased on the sequence of programming instructions listed in FIG. 1 a.

FIG. 2 illustrates an example of token or signal based criticalsections.

FIG. 3 illustrates an example of moving instructions outside of acritical section to shorten the critical section according to oneembodiment of the invention.

FIG. 4 is a flow chart describing the key operations in accordance toone embodiment of the invention.

FIG. 5 a is a control flow chart depicting a sequence of programminginstructions based on blocks.

FIG. 5 b is a control flow chart depicting a sequence of programminginstructions with super block organization.

FIG. 6 a is a dependency graph illustrating a sequence of programminginstructions before rescheduling the instructions in accordance to oneembodiment of the invention.

FIG. 6 b is a dependency graph illustrating a sequence of programminginstruction after rescheduling the instructions in accordance to oneembodiment of the invention.

FIG. 7 is a rescheduled dependency graph illustrating a sequence of theprogramming instructions in accordance to one embodiment of theinvention.

FIG. 8 a is a block diagram illustrating a dependency graph beforeadding pseudo termination points and dependency relationships inaccordance to on embodiment of the invention.

FIG. 8 b is a block diagram illustrating a rescheduled dependency graphafter adding pseudo termination points and dependency relationships inaccordance to on embodiment of the invention.

FIG. 8 c is a block diagram illustrating a rescheduled dependency graphafter adding additional dependency relationships in accordance to onembodiment of the invention.

DETAILED DESCRIPTIONS

A method for scheduling multithreaded programming instructions based ona dependency graph is described below. A person of ordinary skill in thepertinent art, upon reading the present disclosure, will recognize thatvarious novel aspects and features of the present invention canimplemented independently or in any suitable combination, and further,that the disclosed embodiments are merely illustrative and not meant tobe limiting.

FIG. 3 illustrates an example of moving instructions outside of acritical section to shorten the critical section according to oneembodiment of the invention. In a token or signal based critical sectiondescribed above, thread 302 may wait until thread 301 exits a criticalsection 311 before thread 302 may begin to execute its instructions in acritical section 312. A shaded block 350 represents the instructionsblocked by a wait instruction 351. Since the wait instruction 351already blocks all the subsequent instructions in 350, the instructionsin 350 may be moved outside of the critical section 311 and notaffecting the sequence in which the instructions may be executed.

When the wait instruction 351 is moved outside of the critical section311, the critical section 311 may be shortened. As depicted in FIG. 3 b,a critical section 361 is shorter than the critical section 311 depictedin FIG. 3 a. As a result, thread 371 may release the critical section361 to thread 372 sooner than thread 301 releases the critical section311 to thread 302. In this embodiment of the invention, the waitinstruction 351 is moved to a location indicated by 381 and theinstructions blocked by the wait instructions, 350, are moved to alocation indicated by 380. When critical sections are shortened as muchas they may be shortened, a multithreaded program may be executedefficiently.

FIG. 4 is a generalized flow chart describing the key operations inaccordance to one embodiment of the invention. Operation 401 constructsa dependency graph based on blocks, nodes, and super blocks. In oneembodiment of the invention, the dependency graph is a logicalrepresentation of the programming instructions where programminginstructions may be organized logically into blocks, nodes, and superblocks, which will be discussed in details below.

Operation 402 determines all the critical sections included in thedependency graph. A dependency graph may represent the entireprogramming instructions or it may represent a portion of theprogramming instructions. If a dependency graph represents the entireprogramming instructions, the dependency graph may include all thecritical sections. On the contrary, if a dependency graph represents apartial programming instructions, its logical organization may include aportion of the critical sections. In one embodiment of the invention, acritical section may begin before entering the partial programminginstructions represented by the dependency graph and end before exitingthe partial programming instructions. In another embodiment of theinvention, a critical section may begin after entering the partialprogramming instructions and end subsequent to exiting the partialprogramming instructions.

In one embodiment of the invention, a partial programming instruction istreated as a complete program. Therefore, an open-ended critical sectionmay not be processed correctly. Termination points may be added to thedependence graph to ensure the completeness of the program (operation403).

Operation 404 determines whether the programming instructions in thecritical sections could be executed outside of the critical sections. Ifso, operation 405 schedules these programming instructions outside ofthe critical sections by inserting dependency relationships to ensurethese instructions are not executed during the critical sections. Afterrescheduling, in operation 406, a reconstructed dependency graph isformed.

FIG. 5 a is a control flow graph depicting a plurality of blocksrepresenting a grouping of a sequence of programming instructions. In501, programming instructions are organized merely on the block level.In one embodiment of the invention, each block may include a sequence ofprogramming instructions. In another embodiment of the invention, eachblock may include only a single instruction. For example, blocks b3 andb4 may include multiple programming instructions and block b2 maycontain a single programming instruction.

The sequence of programming instructions may also be organized orgrouped by methods other than blocks. In one embodiment of theinvention, the programming instructions may be organized or groupedbased on the different program constructs such as a single instruction,a conditional construct, or a loop construct. In this embodiment of theinvention, such grouping may be referred to as the nodes. In anotherembodiment of the invention, the instructions may be organized orgrouped based on nodes into super blocks.

FIG. 5 b is a control flow graph depicting a sequence of programminginstructions with super block organization. In one embodiment of theinvention, a super block may contain a sequence of nodes or blocks. Inthe figure, diagram 511 depicts a super block overview based on severalconditional constructs and a loop construct. One of the conditionalconstructs includes block b2, b3, and b4 and another conditionalconstruct includes block b1, b2, and b6. The loop construct includesblock b7 and b8.

As discussed before, the programming instructions may also be organizedby super blocks. As an example, the blocks in FIG. 5 a may be organizedin at least four different ways based on super blocks. The first superblock, s1, would include node 513 and node 514. Node 513 includes blockb1, b2, b3, b4, b5, b6, b7, b8, and b9, and node 514 includes block b10.Node 513 is a conditional construct and 514 may be a single instruction.The second example of the super block, 521, may include node 522 andnode 523. Node 522 may include block b2, b3 and b4, and node 523includes block b5. In this example, 522 is a conditional construct and523 may be a single instruction.

The third example of the super block, 515, may include node 516 and node517. Node 516 may include block b6, b7, and b8, and node 517 may includeblock b9. In this example, node 516 is a loop construct and 517 is asingle instruction. The fourth example of the super block, 520, mayinclude node 519 and node 518. Node 519 may include block b7 and node518 may include block b8. In this example, node 519 and 518 are twosingle instructions.

FIG. 6 a is a dependency graph illustrating a sequence of programminginstructions before scheduling in accordance to one embodiment of theinvention. In a sequence of programming instructions such as:

b1: CSBegin; R1=[R2]; signal s1

b2: Wait s1; R3=R1

b3: Wait s1

b4: R5=R6

b5: CSEnd

wherein CSBegin indicates the beginning of a critical section and CSEndindicates the end of a critical section, a dependency graph such as theone depicted in FIG. 6 a may be used to illustrate the sequence. In oneembodiment of the invention, the dependency graph may be organized inblocks, nodes and super blocks. For example, node 1 may include block 1,node 2 may include blocks 2, 3, and 4, and node 3 may includes block 5as shown in FIG. 6 a as node 601, node 607, and node 606, respectively.

A super block 603, may include node 601, node 607, and node 606. In theabove example of the programming instructions, blocks 3 and 4 areindependent of block 1 because block 3 does nothing and block 4 can beexecuted independently from block 1. Although block 2 depends from block1, so long block 2 is executed after block 1, block 2 will use thecorrect R1 because before block 2 can execute the instruction, R3=R1, b2must wait for signal s1 from block 1.

Knowing the dependency relationships between each block enables therescheduling of the programming instructions and the critical section isshortened. In return, a shortened critical section may permit otherthreads to access the critical section sooner than the programminginstructions have originally planned. In one embodiment of theinvention, block 5 at 606 may be moved in according to FIG. 6 b. FIG. 6b is a dependency graph illustrating a sequence of programminginstructions after scheduling the instructions where block 1 611 remainsas the beginning of a critical section and super block 617 includingblocks 2, 3 and 4 are moved subsequent to block 5 at 616. When theexecution of the programming instructions based on the dependency graphillustrated in FIG. 6 b, instructions may be executed in the orderdescribed by the dependency graph. In this example, the critical sectionends at block 5 at 616. Other threads waiting to access the criticalsection may acquire access to the global memory at this time.

FIGS. 7, and 8 a-c are to be discussed concurrently. FIG. 7 is a flowchart describing the detailed operations in constructing an initialdependency graph and reconstructing the dependency graph. In operation701, a dependency graph is constructed based on blocks, nodes, and superblocks. As discussed above, blocks may include a sequence of programminginstructions, nodes may represent programming constructs, and superblocks may include a sequence of blocks. FIGS. 8 a-b are used toillustrate the operations described in FIG. 7.

FIG. 8 a is a diagram illustrating the initial dependency graph 800subsequent to operation 701 based on an exemplary programminginstructions. In one embodiment of the invention, node 810 may includepartial programming instructions, a CSEnd1, a CSBegin2, a CSEnd2, and aCSBegin3. Node 811 may include a single instruction, wait s1. Node 812may include a CSEnd3. And node 813 may include a CSBegin4. In addition,link 820 and 821 represent that node 810 may be executed before node 812and node 12 before node 813.

In constructing the initial dependency graph 800, the dependencyrelationships between the programming instructions in a node areeliminated in operation 702. In one embodiment of the invention, a nodemay include multiple programming instructions but the node may bedepicted in the dependency graph as a single block. For example, node811 includes two programming instructions, namely, Wait s1 and R3=R1 butthe node is illustrated in the initial dependency graph 800 as a singlenode, even though instruction R3=R1 depends from instruction wait s1. Ifa node includes only a single instruction, there is no dependencyrelationship and therefore, operation 702 may be skipped.

Subsequent to constructing the initial dependency graph 800 in operation701, operation 702 determines the critical sections associated with theinitial dependency graph 800 and inserts appropriate pseudo terminationpoints. As discussed above, a dependency graph may represent the entireprogramming instructions or it may represent a portion of theprogramming instructions. If the dependency graph represents a partialprogramming instructions, its logical organization may include a portionof critical sections.

In one embodiment of the invention, if a critical section begins beforeentering the partial programming instructions and ends before exitingthe partial programming instructions, a termination point may beinserted to the rescheduled dependency graph. This is depicted in FIG. 8a where node 810 includes a CSEnd1 but the dependency graph 800 does notinclude a CSBegin1. This indicates that critical section 1 begins priorto the entering of this portion of the programming instructions. In thisembodiment of the invention, a termination point may be added to markthe beginning of the critical section. For example, in FIG. 8 b, apseudo CSBegin1 854 may be inserted in the rescheduled dependency graph850.

In another embodiment of the invention, if the critical section beginsafter entering the partial programming instructions but ends subsequentto exiting of the partial programming instructions, a termination pointmay by inserted to the rescheduled dependency graph. This is depicted inFIG. 8 a where node 813 includes a CSBegin4 but the dependency graph 800does not include a CSEnd4. This indicates that critical section 4 beginsafter entering the partial programming instructions but ends subsequenceto the exiting of the partial programming. In this embodiment of theinvention, a termination point may be inserted signify an end to thecritical section. For example, in FIG. 8 b, a pseudo CSEnd4 855 may beinserted in the reconstructed dependency graph 850.

Subsequently, operation 703 inserts relevant dependency relationshipsbetween the blocks in the reconstructed dependency graph. In oneembodiment of the invention, dependency relationships 861, 862, 863,864, and 865 may be inserted to ensure that CSBegin1 854 is executedprior to all other nodes in this reconstructed dependency graphrepresentation. Similarly, dependency relationships 831, 832, 833, 834,and 835 may be inserted to ensure that CSEnd4 855 is executed subsequentto all other nodes in this reconstructed dependency graphrepresentation.

Operation 704 inserts additional dependency relationships to optimizethe efficiency during the memory latency. In one embodiment of theinvention, three types of dependency relationship may be added to thereconstructed dependency graph 880. The first type may be referred asthe direct dependency. FIG. 8 c is a diagram illustrating areconstructed dependency graph with additional dependency relationships.In FIG. 8 c, direct dependency 891 may be added because node 871directly depends from node 876. A direct dependency relationship isinserted if a node depends from a CSBegin or a CSEnd depends on thenode. In this example, Node 871 depends from node 870 because wait s1may be executed outside of the critical section 1. By inserting thedependency relationship 891, instructions in 871 may be executed afterthe CSEnd1 in node 876 is executed. Consequently, instructions in 871are scheduled out of the critical section 1 and therefore, shorten thecritical section 1.

The second type of dependency relationship may be referred to as theindirect dependency. In FIG. 8 c, the indirect dependency relationship892 is inserted between node 872 and node 871. This type of dependencyrelationship may be inserted if a node may be scheduled out of othercritical sections. In this example, node 872 is independent of criticalsection 2 or 3, therefore, node 872 may be scheduled to be executedafter these two critical sections have run. By inserting an indirectdependency relationship from node 872 to node 871, the reconstructeddependency graph 870 describes that node 872 may be executed before node871 is executed. This ensures that the durations of the critical section2 and 3 are again, shortened.

The third type of dependency relationship may be referred to as theshortest lifetime dependency. In FIG. 8 c, the shortest lifetimedependency relationship 893 is inserted from node 871 to node 873. Thistype of dependency relationship functions like a stopper and it may beinserted to the reconstructed dependency graph 870 to stop a moving nodeafter it has been scheduled outside of the critical sections. In thisexample, after node 871 has been successfully scheduled outside of thecritical sections 2 and 3, the shorted lifetime dependency relationship893 ensures that node 871 is executed before the end of thereconstructed dependency graph.

An example of a pseudo code according to an embodiment of the inventionin which the method to construct an additional dependency graph may beimplemented is provided below.

constructing initial dependency graph, DG; // containing super block,nodes, and blocks constructing initial transitive closure of DG, H andthe inverse, Hinv inserting pseudo CSBegin and CSEnd in the super blockdo { changed = false; for each node n in the super block for each(CSBegin, CSEnd) in the super block { if (e ∈ H[n] && !n ∈ H[b] && !b ∈H[n]) { changed = true; add (n−>b) into DG; ∀ × ∈{b}∪ H[b], Hinv[x] |={n}∪ Hinv[n]; ∀ × ∈{n}∪ Hinv[n], H[x] |= {b}∪ H[b]; } else if (n ∈ H[b]&& !n ∈ H[e] && !e ∈ H[n]) { changed = true; add (e−>n) into DG; ∀ ×∈{e}∪ Hinv[e], H[x] |= {n}∪ H[n]; ∀ × ∈{n}∪ H[n], Hinv[x] |= {e}∪Hinv[e]; } } } while (changed); for each node n in the super block foreach (CSBegin, CSEnd) in the super block if (!n ∈ H[b] && !b ∈ H[n] &&!n ∈ H[e] && !e ∈ H[n]) DG[n] = b;

As discussed previously, the performance of a network application,designed in a multithreaded environment, can be improved if more threadscan be processed simultaneously. A network system may include a networkprocessor such as the Intel Internet eXchange Processor (IXPs) that iscapable of Ethernet data processing. The network system may communicatewith other systems in the network via its network interfaces and mayalso be referred to as fabric. A fabric receives and distributes thetransmitted data from a transmitter to the fabric. Network transmissionsmay be wired or wireless based on network standard know in the art suchas Ethernet cable, fiber optical transmissions, 802.11 standards, orsatellite transmissions.

One embodiment of the invention may be implemented on a machine-readablemedium. machine-readable medium may include any mechanism for storing ortransmitting information in a form readable by a machine (e.g., acomputer), not limited to Compact Disc Read-Only Memory (CD-ROMs),Read-Only Memory (ROMs), Random Access Memory (RAM), ErasableProgrammable Read-Only Memory (EPROM), and a transmission over theInternet

The embodiments of the invention have been described in the context ofnetwork packet processing; however, it is to be understood that othercomputers may utilize the embodiments described herein. For example,computers such as product shipments, inventory processing, airlineflights routing, may utilize the embodiments described herein.

Although the embodiments of the invention have been described in detailhereinabove, it should be appreciated that many variations and/ormodifications and/or alternative embodiments of the basic inventiveconcepts taught herein that may appear to those skilled in the pertinentart will still fall within the spirit and scope of the invention asdefined in the appended claims.

1. A computer implemented method for rearranging a computer programcomprising: organizing the computer program into a plurality of blocks;determining a critical section of the computer program; constructing adependency graph based on the organization of the computer program;recognizing a portion of the computer program that could be executedoutside of the critical section; and inserting a plurality of dependencyrelationships between the plurality of blocks to cause execution of therecognized portion of the computer program outside of the criticalsection.
 2. The method of claim 1, wherein a block includes a computerprogram instruction.
 3. The method of claim 1 further comprisesorganizing the computer program based on a node and a super block,wherein the node includes a plurality of blocks and the super blockincludes a plurality of nodes.
 4. The method of claim 1, wherein thecritical section of the computer program accesses shared resources. 5.The method of claim 1 further comprises comprising determining to theextent the critical section is part of the dependency graph.
 6. Themethod of claim 5 further comprises comprising adding a terminationpoint to the critical section if a portion of the critical section isoutside of the dependency graph.
 7. The method of claim 1 furthercomprises comprising inserting additional dependency relationship basedon a direct dependency, an indirect dependency, or a shortest life-timedependency.
 8. The method of claim 1 further comprises comprisingscheduling to execute the computer program based on the dependencygraph.
 9. A computer implemented system for rearranging a computerprogram comprising: a computer program organizer, wherein the organizerorganizes the computer program into a plurality of blocks; a criticalsection determination module; a dependency graph construction module,wherein a dependency graph is constructed based on the organization ofthe computer program; and a dependency relationships inserter, whereinthe dependency relationship is inserted between the plurality of blocksto cause execution of the recognized portion of the computer programoutside of a critical section.
 10. The system of claim 9, wherein thecritical section determination module determines to the extent thecritical section is part of the dependency graph.
 11. The system ofclaim 9, wherein the critical section of the computer program accessesshared resources.
 12. The system of claim 11, wherein the dependencyrelationships inserter inserts a termination point to the criticalsection if a portion of the critical section is outside of thedependency graph.
 13. A system for processing a plurality of networkpackets comprising: a network processor; a network interface to controlthe transmission between the network processor and a network; a sharedresource accessible to the plurality of network packets; a networkprocessor program to process the plurality of network packets; adependency graph constructor to construct a dependency graph based onthe network processor program; and a dependency relationship inserter tooptimize the network processor program by inserting a plurality ofdependency relationships to rearrange the order in which the networkprocessor program is executed.
 14. The system in claim 13, wherein thedependency graph constructor further determines a critical section andto the extent a critical section is part of the dependency graph. 15.The system in claim 13, wherein the dependency relationship insertermodule inserts additional dependency relationships based on a directdependency, an indirect dependency, or a shortest life-time dependency.16. A machine-accessible medium that provides instructions that, whenexecuted by a processor, causes the processor to: organize a computerprogram based on a plurality of blocks; determine a critical section ofthe computer program; construct a dependency graph based on theorganization of the computer program; recognize a portion of thecomputer program that could be executed outside of the critical section;and insert a plurality of dependency relationships between the pluralityof blocks to cause execution of the recognized portion of the computerprogram outside of the critical section.
 17. The machine readable mediumof method 16, wherein the critical section of the computer programaccesses shared resources.
 18. The machine readable medium of method 16further comprises inserting a termination point to the critical sectionif a portion of the critical section is outside of the computer program.19. The machine readable medium of method 16 further comprises insertingdependency relationships based on a direct dependency, an indirectdependency, or a shortest life-time dependency.